Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms

نویسندگان

  • Bogdan Trawinski
  • Magdalena Smetek
  • Zbigniew Telec
  • Tadeusz Lasota
چکیده

In the paper we presented the guidelines for the application of nonparametric statistical tests and post hoc procedures devised to perform multiple comparisons of machine learning algorithms. We emphasized it is necessary to distinguish between pairwise and multiple comparison tests. We showed that the pairwise Wilcoxon test, when employed to multiple comparisons, would lead to overoptimistic conclusions. We carried out intensive normality examination employing different ten tests showing that the output of machine learning algorithms for regression problems do not satisfy normality requirements. We conducted experiments on nonparametric statistical tests and post hoc procedures designed for multiple 1×N and N×N comparisons with six different neural regression algorithms over 29 benchmark regression data sets. Our investigation proved the usefulness and strength of multiple comparison statistical procedures to analyse and select machine learning algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonparametric Statistical Analysis of Machine Learning Algorithms for Regression Problems

Several experiments aimed to apply recently proposed statistical procedures which are recommended for analysing multiple 1×n and n×n comparisons of machine learning algorithms were conducted. 11 regression algorithms comprising 5 deterministic and 6 neural network ones implemented in the data mining system KEEL were employed. All experiments were performed using 29 benchmark datasets for regres...

متن کامل

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Comparison of Ensemble Approaches: Mixture of Experts and AdaBoost for a Regression Problem

Two machine learning approaches: mixture of experts and AdaBoost.R2 were adjusted to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. The computationally intensive experiments were conducted aimed to compare empirically the prediction accuracy of ensemble models generated by the methods. The analysis of t...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Applied Mathematics and Computer Science

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2012